A Coalescent Model for Genotype Imputation

نویسندگان

  • Ethan M. Jewett
  • Matthew Zawistowski
  • Noah A. Rosenberg
  • Sebastian Zöllner
چکیده

The potential for imputed genotypes to enhance an analysis of genetic data depends largely on the accuracy of imputation, which in turn depends on properties of the reference panel of template haplotypes used to perform the imputation. To provide a basis for exploring how properties of the reference panel affect imputation accuracy theoretically rather than with computationally intensive imputation experiments, we introduce a coalescent model that considers imputation accuracy in terms of population-genetic parameters. Our model allows us to investigate sampling designs in the frequently occurring scenario in which imputation targets and templates are sampled from different populations. In particular, we derive expressions for expected imputation accuracy as a function of reference panel size and divergence time between the reference and target populations. We find that a modestly sized "internal" reference panel from the same population as a target haplotype yields, on average, greater imputation accuracy than a larger "external" panel from a different population, even if the divergence time between the two populations is small. The improvement in accuracy for the internal panel increases with increasing divergence time between the target and reference populations. Thus, in humans, our model predicts that imputation accuracy can be improved by generating small population-specific custom reference panels to augment existing collections such as those of the HapMap or 1000 Genomes Projects. Our approach can be extended to understand additional factors that affect imputation accuracy in complex population-genetic settings, and the results can ultimately facilitate improvements in imputation study designs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genotype imputation in a coalescent model with infinitely-many-sites mutation.

Empirical studies have identified population-genetic factors as important determinants of the properties of genotype-imputation accuracy in imputation-based disease association studies. Here, we develop a simple coalescent model of three sequences that we use to explore the theoretical basis for the influence of these factors on genotype-imputation accuracy, under the assumption of infinitely-m...

متن کامل

Estimation of genotype imputation accuracy using reference populations with varying degrees of relationship and marker density panel

Genotype imputation from low-density to high-density (SNP) chips is an important step before applying genomic selection, because denser chips can provide more reliable genomic predictions. In the current research, the accuracy of genotype imputation from low and moderate-density panels (5K and 50K) to high-density panels in the purebred and crossbred populations was assessed. The simulated popu...

متن کامل

Imputation of parent-offspring trios and their effect on accuracy of genomic prediction using Bayesian method

The objective of this study was to evaluate the imputation accuracy of parent-offspring trios under different scenarios. By using simulated datasets, the performance Bayesian LASSO in genomic prediction was also examined. The genome consisted of 5 chromosomes and each chromosome was set as 1 Morgan length. The number of SNPs per chromosome was 10000. One hundred QTLs were randomly distributed a...

متن کامل

A generic coalescent-based framework for the selection of a reference panel for imputation.

An important component in the analysis of genome-wide association studies involves the imputation of genotypes that have not been measured directly in the studied samples. The imputation procedure uses the linkage disequilibrium (LD) structure in the population to infer the genotype of an unobserved single nucleotide polymorphism. The LD structure is normally learned from a dense genotype map o...

متن کامل

اهمیت خویشاوندی ژنتیکی و رکورد فنوتیپی بر صحت ژنومی داده‌های جانهی شبیه‌ سازی شده با استفاده از مدل های حیوانی در حضور اثرات متقابل ژنوتیپ و محیط

The objective of this study was to investigate the role of genetic relationships between training and validation set with considering different ratio of phenotypic records of training set on accuracy of genomic prediction via animal models containing genotype × environment interactions in simulated imputation data. For this purpose, four different scenarios using 15k density containing differen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 191  شماره 

صفحات  -

تاریخ انتشار 2012